synthetic instruction
CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
Yu, Ping, Lanchantin, Jack, Wang, Tianlu, Yuan, Weizhe, Golovneva, Olga, Kulikov, Ilia, Sukhbaatar, Sainbayar, Weston, Jason, Xu, Jing
We propose CoT -Self-Instruct, a synthetic data generation method that instructs LLMs to first reason and plan via Chain-of-Thought (CoT) based on given seed tasks, and then generate a new synthetic example of similar quality and complexity. This is followed by a filtering step to select high-quality data using automatic metrics, which are then used for LLM training. In verifiable reasoning, our synthetic data significantly outperforms existing training datasets, such as s1k and OpenMathReasoning, when evaluated on MA TH500, AMC23, AIME24, and GPQA-Diamond. The transformative rise of Large Language Models (LLMs) has initiated a substantial paradigm shift in the domain of deep learning (Zhang et al., 2023; Guo et al., 2023; Long et al., 2024). The development of such models emphasizes scale, and relies heavily on large volumes of high-quality data (Gandhi et al., 2024; Abdin et al., 2024). However, acquiring such data from human sources can often be challenging or even impractical due to factors such as high costs, data scarcity, and privacy concerns (Kurakin et al., 2023). Furthermore, several studies (Hosking et al., 2023; Singh et al., 2023; Gilardi et al., 2023) have pointed out that human-generated data, being inherently prone to biases and errors, may not always be ideal for model training or evaluation. In this context, synthetic data emerges as a viable alternative for obtaining high-quality datasets. Synthetic data is artificially generated to replicate the characteristics and patterns of real-world data. One innovative approach to creating such data is the Self-Instruct method (Wang et al., 2022a), which utilizes LLMs themselves to generate instruction-following examples. This method begins by selecting a small set of seed instruction-following samples, which are then used to prompt LLMs to produce additional demonstrations in a similar format. Since then, a number of variants have been introduced that increase the complexity of queries (Liu et al., 2023; Zeng et al., 2024), maintain semantic diversity (Ding et al., 2023), scale the synthetic data (Y uan et al., 2023), and use these methods in self-improvement loops (Y uan et al., 2024).
Better Alignment with Instruction Back-and-Forth Translation
Nguyen, Thao, Li, Jeffrey, Oh, Sewoong, Schmidt, Ludwig, Weston, Jason, Zettlemoyer, Luke, Li, Xian
We propose a new method, instruction back-and-forth translation, to construct high-quality synthetic data grounded in world knowledge for aligning large language models (LLMs). Given documents from a web corpus, we generate and curate synthetic instructions using the backtranslation approach proposed by Li et al.(2023a), and rewrite the responses to improve their quality further based on the initial documents. Fine-tuning with the resulting (backtranslated instruction, rewritten response) pairs yields higher win rates on AlpacaEval than using other common instruction datasets such as Humpback, ShareGPT, Open Orca, Alpaca-GPT4 and Self-instruct. We also demonstrate that rewriting the responses with an LLM outperforms direct distillation, and the two generated text distributions exhibit significant distinction in embedding space. Further analysis shows that our backtranslated instructions are of higher quality than other sources of synthetic instructions, while our responses are more diverse and complex than those obtained from distillation. Overall we find that instruction back-and-forth translation combines the best of both worlds -- making use of the information diversity and quantity found on the web, while ensuring the quality of the responses which is necessary for effective alignment.
- Asia > Russia (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.05)
- Europe > Eastern Europe (0.04)
- (13 more...)
- Transportation > Passenger (1.00)
- Transportation > Ground > Rail (1.00)
- Media > Music (1.00)
- (2 more...)
Privacy-Preserving Instructions for Aligning Large Language Models
Yu, Da, Kairouz, Peter, Oh, Sewoong, Xu, Zheng
Service providers of large language model (LLM) applications collect user instructions in the wild and use them in further aligning LLMs with users' intentions. These instructions, which potentially contain sensitive information, are annotated by human workers in the process. This poses a new privacy risk not addressed by the typical private optimization. To this end, we propose using synthetic instructions to replace real instructions in data annotation and model fine-tuning. Formal differential privacy is guaranteed by generating those synthetic instructions using privately fine-tuned generators. Crucial in achieving the desired utility is our novel filtering algorithm that matches the distribution of the synthetic instructions to that of the real ones. In both supervised fine-tuning and reinforcement learning from human feedback, our extensive experiments demonstrate the high utility of the final set of synthetic instructions by showing comparable results to real instructions. In supervised fine-tuning, models trained with private synthetic instructions outperform leading open-source models such as Vicuna.
- Europe > Austria > Vienna (0.14)
- North America > United States > Maryland > Anne Arundel County > Odenton (0.04)
- North America > United States > Connecticut > Hartford County > Manchester (0.04)
- (3 more...)
AIGeN: An Adversarial Approach for Instruction Generation in VLN
Rawal, Niyati, Bigazzi, Roberto, Baraldi, Lorenzo, Cucchiara, Rita
In the last few years, the research interest in Vision-and-Language Navigation (VLN) has grown significantly. VLN is a challenging task that involves an agent following human instructions and navigating in a previously unknown environment to reach a specified goal. Recent work in literature focuses on different ways to augment the available datasets of instructions for improving navigation performance by exploiting synthetic training data. In this work, we propose AIGeN, a novel architecture inspired by Generative Adversarial Networks (GANs) that produces meaningful and well-formed synthetic instructions to improve navigation agents' performance. The model is composed of a Transformer decoder (GPT-2) and a Transformer encoder (BERT). During the training phase, the decoder generates sentences for a sequence of images describing the agent's path to a particular point while the encoder discriminates between real and fake instructions. Experimentally, we evaluate the quality of the generated instructions and perform extensive ablation studies. Additionally, we generate synthetic instructions for 217K trajectories using AIGeN on Habitat-Matterport 3D Dataset (HM3D) and show an improvement in the performance of an off-the-shelf VLN method. The validation analysis of our proposal is conducted on REVERIE and R2R and highlights the promising aspects of our proposal, achieving state-of-the-art performance.
A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Kamath, Aishwarya, Anderson, Peter, Wang, Su, Koh, Jing Yu, Ku, Alexander, Waters, Austin, Yang, Yinfei, Baldridge, Jason, Parekh, Zarana
Recent studies in Vision-and-Language Navigation (VLN) train RL agents to execute natural-language navigation instructions in photorealistic environments, as a step towards robots that can follow human instructions. However, given the scarcity of human instruction data and limited diversity in the training environments, these agents still struggle with complex language grounding and spatial language understanding. Pretraining on large text and image-text datasets from the web has been extensively explored but the improvements are limited. We investigate large-scale augmentation with synthetic instructions. We take 500+ indoor environments captured in densely-sampled 360 degree panoramas, construct navigation trajectories through these panoramas, and generate a visually-grounded instruction for each trajectory using Marky, a high-quality multilingual navigation instruction generator. We also synthesize image observations from novel viewpoints using an image-to-image GAN. The resulting dataset of 4.2M instruction-trajectory pairs is two orders of magnitude larger than existing human-annotated datasets, and contains a wider variety of environments and viewpoints. To efficiently leverage data at this scale, we train a simple transformer agent with imitation learning. On the challenging RxR dataset, our approach outperforms all existing RL agents, improving the state-of-the-art NDTW from 71.1 to 79.1 in seen environments, and from 64.6 to 66.8 in unseen test environments. Our work points to a new path to improving instruction-following agents, emphasizing large-scale imitation learning and the development of synthetic instruction generation capabilities.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- (18 more...)
- Information Technology > Security & Privacy (0.48)
- Education > Curriculum > Subject-Specific Education (0.46)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Instruction-driven history-aware policies for robotic manipulations
Guhur, Pierre-Louis, Chen, Shizhe, Garcia, Ricardo, Tapaswi, Makarand, Laptev, Ivan, Schmid, Cordelia
In human environments, robots are expected to accomplish a variety of manipulation tasks given simple natural language instructions. Yet, robotic manipulation is extremely challenging as it requires fine-grained motor control, long-term memory as well as generalization to previously unseen tasks and environments. To address these challenges, we propose a unified transformer-based approach that takes into account multiple inputs. In particular, our transformer architecture integrates (i) natural language instructions and (ii) multi-view scene observations while (iii) keeping track of the full history of observations and actions. Such an approach enables learning dependencies between history and instructions and improves manipulation precision using multiple views. We evaluate our method on the challenging RLBench benchmark and on a real-world robot. Notably, our approach scales to 74 diverse RLBench tasks and outperforms the state of the art. We also address instruction-conditioned tasks and demonstrate excellent generalization to previously unseen variations.
Episodic Transformer for Vision-and-Language Navigation
Pashevich, Alexander, Schmid, Cordelia, Sun, Chen
Interaction and navigation defined by natural language instructions in dynamic environments pose significant challenges for neural agents. This paper focuses on addressing two challenges: handling long sequence of subtasks, and understanding complex human instructions. We propose Episodic Transformer (E.T.), a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions. To improve training, we leverage synthetic instructions as an intermediate representation that decouples understanding the visual appearance of an environment from the variations of natural language instructions. We demonstrate that encoding the history with a transformer is critical to solve compositional tasks, and that pretraining and joint training with synthetic instructions further improve the performance. Our approach sets a new state of the art on the challenging ALFRED benchmark, achieving 38.4% and 8.5% task success rates on seen and unseen test splits.